Back

Journal of General Internal Medicine

Springer Science and Business Media LLC

Preprints posted in the last 30 days, ranked by how well they match Journal of General Internal Medicine's content profile, based on 20 papers previously published here. The average preprint has a 0.04% match score for this journal, so anything above that is already an above-average fit.

1
Medical discrimination and the selective erosion of institutional health trust: evidence from the Health Information National Trends Survey 6 and 7

Park, A.; Yin, L.; Wong, A.; Lee, C.; Choi, Y.

2026-06-09 public and global health 10.64898/2026.06.06.26355057 medRxiv
Top 0.1%
7.0%
Show abstract

Medical discrimination may alter how patients relate to health information sources following adverse care encounters. We examined whether discrimination experience is associated with selective erosion of institutional health trust and with compensatory digital health engagement, using nationally representative data from the Health Information National Trends Survey (HINTS) 6 (2022; n=6,252) and HINTS 7 (2024; n=7,278). Survey-weighted modified Poisson regression estimated prevalence ratios (PRs) for binary high-trust outcomes, and survey-weighted ordinary least squares estimated coefficients for continuous outcomes; jackknife replicate weights (50 replicates) provided variance estimates. Discrimination was associated with substantially lower probability of high trust in the healthcare system (PR=0.39; 95% CI 0.30-0.52) and physicians (PR=0.85; 95% CI 0.77-0.94), with no significant association for trust in scientists, government, family, or religious organisations. The clinical-institutional pattern replicated in HINTS 6, which additionally showed reduced trust in scientists for race/ethnicity-based discrimination. Contrary to a disengagement hypothesis, discrimination-exposed adults showed higher probability of online health information seeking (PR=1.06), health app use (PR=1.11), and online provider messaging (PR=1.13); these associations persisted after adjustment for trust in physicians. Discrimination was independently associated with lower health self-efficacy (b=-0.271). Medical discrimination selectively erodes trust in clinical institutions while leaving broader epistemic trust largely intact. Despite this, discrimination-exposed patients engage more actively with digital health channels, consistent with compensatory reorientation toward non-clinical information sources. These findings describe engaged but institutionally alienated patients, with implications for restoring clinical trust and for equity-centred digital health design.

2
Variation in Telehealth Use in a National Home Test-to-Treat Program for Acute Respiratory Infections

Losos, W.; Wang, B.; Fisher, K.; O'Connor, L.; Soni, A.; Gerber, B.

2026-05-26 health informatics 10.64898/2026.05.24.26353984 medRxiv
Top 0.1%
6.3%
Show abstract

Background Home Test-to-Treat (HTTT) programs deliver timely antiviral treatment for acute respiratory infections, including COVID-19 and influenza, through at-home testing and telehealth. Because access is often measured by visit occurrence, variation in how and when care is delivered may be overlooked. We hypothesized that telehealth access follows distinct process-based patterns. Methods We analyzed de-identified encounters from the national HTTT program (September 2023-July 2024); 6,213 of 8,160 eligible individuals remained after exclusions for missing data. Phenotypes were derived by k-means clustering of standardized variables capturing encounter timing, modality preference, process duration, and sociodemographic and digital access attributes. Ten-day surveys assessed symptom duration and healthcare utilization. Results Three phenotypes emerged: Delayed/Disrupted Access (n = 1,537; 24.7%), Digitally Engaged but Socioeconomically Vulnerable (n = 1,460; 23.5%), and Mainstream Access and Efficient Utilization (n = 3,216; 51.8%). Mean process duration differed (15.93 [SD 3.84] vs 3.69 [3.31] vs 2.87 [2.41] hours; p < 0.001). Synchronous preference was lowest in the Digitally Engaged group (22.9%); antiviral prescribing was high (88.6%-91.9%). Among 10-day respondents (n = 1,023), symptom duration did not differ. Emergency department visits were most frequent in the Digitally Engaged group (2.3% vs 0.0% and 0.5%; p = 0.02) and urgent care in the Delayed/Disrupted group (5.8% vs 4.1% vs 2.0%; p = 0.02). Conclusions Telehealth use in a national HTTT program formed distinct phenotypes defined by timing, modality, and care-process efficiency. Evaluating equity requires attention to how and when care is delivered, not simply whether it occurred.

3
Hanging on through Omicron, then what? A pre-exit baseline of the U.S. emergency nursing workforce, 2018 to 2022, with implications for the 2026 NSSRN cycle

Squire, K.

2026-06-08 nursing 10.64898/2026.06.07.26355097 medRxiv
Top 0.1%
4.5%
Show abstract

Background. The emergency department in the United States of America functions as a residual access point for healthcare and social services for populations including rural communities, the uninsured, mental health and addiction patients, and the unhoused. The workforce variable that determines unit function (experience density, the concentration of accumulated clinical judgment within a unit workforce) is not measured in hospital accounting systems. Objective. To document workforce composition changes in U.S. emergency nursing across the 2018 and 2022 cycles of the National Sample Survey of Registered Nurses (NSSRN), and to specify falsifiable predictions for the 2026 cycle. Methods. We analyzed NSSRN public-use files using a four-way ED definition extending Castner et al. (2024) and a hospital-bedside-restricted comparator. Variance estimation used jackknife replicate weights for 2018 and Successive Differences Replication for 2022. Burnout was operationalized using the Norful et al. (2023) leaving-reasons proxy across cycles, with sensitivity analysis using the 2022 direct burnout item. Results. A 15-year trajectory (2008-2022) documents progressive experience-density compression: the ED's 15+ year veteran cohort fell from 41.9% to 28.0% over the decade preceding the pandemic, a loss of nearly a third of the senior cohort and a 19.6% decline in mean experience density, before recovering modestly to 33.3% as veteran nurses remained through the pandemic acute phase, leaving the ED as the youngest hospital setting throughout. Hospital non-ED bedside nurses lost senior tenure between cycles (mean 15.65[-&gt;]14.06 years since first licensure; 15+ year share 43.5%[-&gt;]38.7%), while ED nurses retained their senior tail (mean 11.60[-&gt;]12.58). Burnout endorsement rose sharply in both populations (non-ED 27.3%[-&gt;]46.0%; ED 34.2%[-&gt;]61.2%), with the ED-vs-non-ED gap more than doubling. Controlling for tenure, ED status was not independently associated with burnout in 2018 (OR 1.15, 95% CI 0.83-1.59) but was strongly associated in 2022 (OR 1.92, 95% CI 1.44-2.55; p<.001). The direct burnout item showed a parallel pattern (OR 2.92, 95% CI 1.62-5.28). Conclusions. A pandemic-era setting-specific burnout effect emerged in emergency nursing that workforce-composition controls cannot explain. The 2022 cycle establishes a pre-exit baseline against which the 2026 NSSRN will serve as the falsifiable test of post-Omicron veteran exit. Nursing pipeline replacement lag exceeds the interval before 2026 data arrives; the consequences of inaction fall on populations dependent on ED-based residual access.

4
Asymmetric sociodemographic disparity in evidence-grounded clinical AI

Jia, E.; Omar, M.; Barash, Y.; Brook, O. R.; Ahmed, M.; Kruskal, J. B.; Gorenshtein, A.; Klang, E.

2026-05-15 health informatics 10.64898/2026.05.12.26353061 medRxiv
Top 0.2%
4.4%
Show abstract

AI-assisted clinical care may compound, rather than correct, existing health inequities. We applied Omar and colleagues' validated four-domain emergency-medicine benchmark to OpenEvidence (OE), a literature-grounded clinical LLM used by tens of thousands of US physicians daily, across 100 emergency-department cases and 20 sociodemographic labels. OE was consistent on the codified clinical decisions, triage, workup, and treatment, but diverged sharply on mental-health screening, where it flagged many historically marginalized groups between three and ten times more often than demographically unmarked cases. Cases labeled as unhoused received recommendations in 78 to 87 percent of responses (versus a 9 percent no-identifier-control rate); cases labeled as transgender in 22 to 24 percent; and Black transgender women specifically in 47 percent. A pre- registered audit of 193 free-text rationales localized the differential to the inner layer of the response, in the structure and tone of the rationale rather than the recommendation itself. Literature grounding may redistribute sociodemographic disparity in clinical AI rather than remove it. As clinical LLMs move toward agentic deployment, equity audits should examine how evidence is applied to each patient, not only whether citations are present.

5
Racial Disparities in Opioid Overdoses: A Comprehensive Claims-Based Analysis, 2020-2024

Pandey, A.

2026-05-12 addiction medicine 10.64898/2026.05.08.26352752 medRxiv
Top 0.2%
4.2%
Show abstract

PurposeOpioid overdose deaths disproportionately affect racial and ethnic minority populations in the United States, yet claims-based evidence characterizing the multi-dimensional structure of these disparities across incidence, treatment access, costs, and insurance coverage remains limited. MethodsWe conducted a retrospective cross-sectional and longitudinal cohort analysis using the HealthVerity Launch Sample, a large administrative claims database. The study population comprised 3,675,823 patients across 5 racial groups enrolled between 2020 and 2024. Eight primary analyses were conducted, including age-sex standardized overdose rates, temporal disparity trends, medication-assisted treatment (MAT) receipt, naloxone access, pharmacy costs, insurance payer type, care setting, and multivariable logistic regression for overdose risk. ResultsBlack patients had the highest age-sex standardized overdose rate (363.4 per 100,000; rate ratio [RR] = 1.27 vs. White), and those with opioid use disorder (OUD) received MAT at a rate 35% lower than White patients (19.8% vs. 30.7%; RR = 0.645), driven primarily by a buprenorphine access deficit. AIAN patients demonstrated consistent multi-dimensional disadvantage across naloxone access, MAT engagement, and pharmacy costs. After adjustment for payer type, age, and sex, all non-White groups showed lower adjusted odds of overdose than White patients (Black OR = 0.87; AIAN OR = 0.25), with Medicaid enrollment carrying 7.06 times the overdose odds of commercial insurance. ConclusionInsurance type is the dominant predictor of overdose risk, and the disproportionate Medicaid enrollment of Black patients is both a consequence of structural disadvantage and access disparities. Targeted interventions such as buprenorphine expansion in Medicaid and enhanced naloxone distribution are recommended.

6
A longitudinal cohort study comparing clinical trials registered on ClinicalTrials.gov that stopped during the first three years of the SARS-CoV-2 pandemic with trials that stopped in the three years prior

Carlisle, B. G.; Hutchinson, N.; Moyer, H.

2026-05-22 public and global health 10.64898/2026.05.20.26353581 medRxiv
Top 0.2%
3.6%
Show abstract

Background: The global SARS-CoV-2 pandemic disrupted healthcare systems worldwide, raising concerns about its impact on clinical research. Early reports suggested reductions in participant enrollment, interruptions to ongoing trials, and challenges to protocol adherence, yet the magnitude and duration of these operational disruptions remain unclear. Methods: We conducted a registry-based analysis comparing clinical trials during the COVID-19 pandemic (December 2019 to November 2022) with a matched pre-pandemic cohort (December 2016 to November 2019). Studies were included if they reported any modifications to trial status, enrollment, or protocols during the study periods. Key variables included trial stoppage, enrollment changes, and adoption of remote or hybrid procedures. Results: The global SARS-CoV-2 pandemic resulted in widespread disruptions to trial operations with 13,323 clinical trials terminated, suspended or withdrawn over the course of the pandemic, a 38% increase compared to the 9,665 trials that stopped in the 3 years prior to the pandemic. Registries indicated a sharp decline in new participant enrollment across geographic regions and therapeutic areas, with partial recovery in later months. Review findings highlighted barriers including patient inaccessibility, staff redeployment, and supply chain interruptions. Conclusions: The pandemic caused system-wide operational shocks that compromised trial timelines and may have downstream methodological consequences. Recovery in enrollment does not imply restoration of pre-pandemic protocol fidelity or outcome ascertainment. Standardized reporting of disruptions, proactive contingency planning, and resilient trial designs are needed to maintain data integrity during large-scale disruptions and to support reliable evidence generation.

7
Faith Affiliation and Nursing Home Hospitalization Performance: Evidence from a National Stratified Sample

Swaroop, P.

2026-05-13 health systems and quality improvement 10.64898/2026.05.05.26352420 medRxiv
Top 0.2%
3.3%
Show abstract

Background and ObjectivesSkilled nursing facility (SNF) hospitalization rates vary substantially across facilities serving comparable patient populations, yet the organizational factors underlying high performance remain poorly characterized. This study examines whether faith or mission-driven organizational identity is associated with lower-than-expected hospitalization rates in a national sample of Medicare-certified SNFs. DesignCross-sectional analysis of a stratified random sample of 618 Medicare-certified SNFs, drawn from a national cohort of 13,419 facilities with claims-based quality data. Facilities were classified by organizational identity (faith-affiliated, purpose-driven, or secular) using publicly available records. Performance was measured using CMS claims-based hospitalization and emergency department transfer rates adjusted for expected rates given patient case mix. Setting and ParticipantsMedicare-certified skilled nursing facilities in the United States, February 2026 CMS release. MethodsWe computed a composite performance gap as the mean of four z-scored observed-minus-expected measures (short-stay and long-stay hospitalization and ED transfer rates). We tested the association between faith affiliation and performance using Fishers exact test, logistic regression, OLS regression, propensity score matching, and causal mediation analysis. ResultsFaith-affiliated or purpose-driven facilities constituted 14.7% of significant overperformers (95% CI: 7.0-23.5%) and 0% of significant underperformers (95% CI: 0.0-4.4%), a monotonic gradient confirmed across all five performance zones. After propensity score matching on facility size, ownership type, and urbanicity (n=49 matched pairs), faith-affiliated facilities achieved 18.2% short-stay rehospitalization compared to 21.7% for matched secular facilities (3.5 percentage points fewer, p=0.019), and 1.30 long-stay hospitalizations per 1,000 resident-days compared to 1.71 (0.41 fewer per 1,000 days, p=0.019). Faith affiliation was associated with 61% more RN staffing hours per resident per day (0.96 vs. 0.60 hours, p<0.001), and formal mediation analysis confirmed that RN staffing hours substantially mediated the relationship between faith affiliation and hospitalization performance. Conclusions and ImplicationsFaith and mission-driven organizational identity is associated with superior hospitalization performance in a national SNF sample, mediated by elevated RN staffing intensity. These findings suggest that organizational culture and values are modifiable upstream determinants of nursing home quality, with implications for quality improvement, workforce policy, and value-based payment design.

8
Glycemic response trajectories on metformin monotherapy in real-world diabetes care

Raghavan, S.; Liu, W. G.; Ho, M. R.; Warsavage, T.; Ghosh, D.; Caplan, L.; Reusch, J. E.

2026-05-26 endocrinology 10.64898/2026.05.24.26353996 medRxiv
Top 0.3%
3.1%
Show abstract

Objectives: Diabetes affects over 500 million people globally and glycemia is inadequately managed. Metformin is the most frequently prescribed initial treatment for type 2 diabetes globally, yet glycemic response trajectories to metformin in routine real-world care and predictors of treatment response have not been well described. We aimed to identify glycemic response trajectories in adults prescribed metformin monotherapy as initial type 2 diabetes treatment and predictors of poor glycemic response to metformin. Design: Observational cohort study using latent class mixed models to identify hemoglobin A1c (HbA1c) trajectory classes, followed by random forests machine learning to predict trajectory class membership. Setting: US Veterans Affairs Healthcare System Participants: Adults treated with metformin alone for >30 days after diabetes diagnosis with a minimum of two HbA1c measurements from 90 days prior to two years after the first metformin prescription (N=140,413). Exposures: Demographic, laboratory, vital sign, and comorbidity data were included as predictors of metformin response trajectory Main Outcomes and Measures: We included all HbA1c measurements (487,604 total) for two years after metformin initiation to define metformin glycemic response trajectories. Results: We identified three HbA1c trajectories: stably low (89.7% of sample, mean HbA1c decrease from 7.2% to 6.6%), brisk response (7.1% of sample, mean HbA1c decrease from 11.4% to 7.0%), and non-response (3.1% of sample, mean HbA1c increase from 8.9% to 10.8%). Of those in the stably low and brisk response classes at 2 years, 91% maintained HbA1c at approximately 7% on metformin alone for 5 years after drug initiation. Prediction models could accurately predict brisk response (91% accuracy) but not metformin non-response (59% accuracy). Conclusions: Most individuals treated initially with metformin monotherapy have a beneficial and durable glycemic response. Predicting individuals who will not respond to metformin may be challenging but is evident within six months with recommended glycemic surveillance. The findings support current guidelines for HbA1c surveillance when initiating diabetes treatment.

9
Characteristics and Circumstances of US Overdose Deaths Identified as Heat-Related

Cano, M.; Mun, C. J.; Sweeney, K.; Daniulaityte, R.

2026-05-14 addiction medicine 10.64898/2026.05.11.26352941 medRxiv
Top 0.3%
3.0%
Show abstract

ObjectivesTo examine the extent to which heat-related causes of death are recorded in fatal drug overdoses, how these patterns vary across states and over time, and how overdose characteristics differ between deaths with, versus without, heat involvement recorded. MethodsDeath certificate data for all drug overdose deaths in US residents from 2001 to 2024 (from the National Center for Health Statistics) were analyzed to identify whether a heat-related cause of death was also listed on the death certificate. Joinpoint regression, descriptive statistics, and nonparametric tests were used to examine temporal trends and compare overdose deaths with versus without recorded heat involvement. ResultsIn 2001, fewer than 10 drug overdose deaths with recorded heat involvement were identified, but this number increased to 558 in 2024. From 2013 to 2024, mortality rates increased significantly, with an estimated annual percent change of 30.1 (95% Confidence Interval, 26.5-47.1). The highest mortality rates and numbers of deaths were observed in residents of Arizona and Nevada. American Indian/Alaska Native, Mexican-heritage, and foreign-born populations accounted for larger shares of overdose deaths with, compared to without, heat involvement recorded. A street or highway was more frequently identified as the place of injury in overdose deaths with (18.9%), versus without (2.2%) heat involvement reported. Psychostimulants such as methamphetamine were involved in 85.9% of overdose deaths with, compared to 28.9% without, recorded heat involvement. ConclusionsAlthough representing only a fraction of all overdose deaths, fatal overdoses involving heat exposure have increased markedly over time and disproportionately impact certain states and demographic groups.

10
Sensor Geometry, Not Signal Processing, Limits Opportunistic Detection of Capillary-Refill-Like Signals by Rule-Based and Language-Model Methods in Archived ICU Waveforms

Landry, T. C.; Kim, Y.

2026-06-09 intensive care and critical care medicine 10.64898/2026.06.07.26355129 medRxiv
Top 0.3%
3.0%
Show abstract

Background. Capillary refill time is a resuscitation target in septic shock,1-4 but bedside measurement is examiner-dependent. An ICU monitor co-records a photoplethysmogram on the pulse oximeter and intermittent noninvasive blood pressure cuff cycles; if the probe and the cuff share a limb, each cycle is an unplanned vascular occlusion test on the distal microvascular bed. Standard practice places the two on opposite limbs. Objective. To measure how often, in MIMIC-IV-WDB v0.1.0, charted cuff cycles show the photoplethysmographic morphology expected of a same-limb cuff and probe, and to characterize the candidate capillary refill-like signal when that morphology is present. Methods. MIMIC-IV-WDB v0.1.05 was linked to the MIMIC-IV clinical database.6 A pre-registered rule-based detector identified candidate occlusion-reperfusion signatures on the 1-Hz perfusion-index envelope around each charted cuff timestamp. The primary endpoint was the proportion of cuff cycles suitable for analysis that were detector-positive at a 15-second reperfusion threshold, with 95% confidence intervals estimated by resampling patients at a fixed seed. A secondary analysis used a locally hosted multimodal language model (a Gemma-3 derivative on a non-device server) to adjudicate the same signature on perfusion-index plots; no MIMIC-IV-WDB content left the workstation. Results. Of 9,224 charted cuff cycles, 8,909 had a usable pulse-oximeter waveform, and 268 cycles in 15 patients (4.30% of the 6,236 cuff cycles suitable for analysis, 95% CI 2.60 to 6.03) met the primary 15-second threshold. The language model adjudicated the same cycles and called 1,367 of the 8,909 cycles with a usable waveform (15.34%) signature-present, roughly five times the detectors count. Because no laterality ground truth exists, agreement with a single blinded reader served as the comparator rather than accuracy. The two methods were about equally concordant with the reader: precision was 0.25 (95% CI 0.14 to 0.39) for the detector and 0.24 (95% CI 0.10 to 0.35) for the language model, although reweighting to the full population of cycles with a usable waveform lowered the language model to 0.030 (95% CI 0.009 to 0.053). These estimates are reference-limited: a blinded re-read of a 150-card subsample showed only moderate intra-rater reliability (Cohen {kappa} 0.46 to 0.59) with systematic undercalling on the first pass, and rescoring against the corrected re-read roughly doubled precision for both methods. Conclusions. Opportunistic extraction of capillary refill-like signals from archived ICU pulse oximetry is limited in two distinct ways. First, sensor geometry limits how often the signal is recordable: cuff cycles rarely show the morphology expected of a same-limb cuff and probe pair, consistent with opposite-limb placement, so the bottleneck is geometry rather than signal processing. Second, the modest reliability of morphology adjudication limits how well any single flagged cycle can be confirmed: against a blinded reader the detector is a usable screen but a noisy confirmer, the reference is itself only moderately reliable, and the language model is no more concordant despite flagging many more cycles. The minority of cycles in which the morphology appears contain a candidate signal that may merit prospective study under controlled placement with laterality recorded.

11
Impact of pharmacist board certification on health outcomes of critically ill patients: An analysis of the Optimizing Pharmacist-Team Integration for ICU patient Management (OPTIM) study

Smith, S. E.; Henry, K.; Heavner, M.; Keedy, C.; Duong, H.; Chen, Z.; Chen, X.; OPTIM Investigator Team, ; Sikora, A.

2026-06-02 intensive care and critical care medicine 10.64898/2026.05.26.26353672 medRxiv
Top 0.3%
2.8%
Show abstract

BACKGROUND: Critical care pharmacists (CCPs) reduce adverse drug events (ADEs) and mortality in the intensive care unit (ICU). Board certification is the established professional standard for CCPs but its impact on ICU patient outcomes, including its relationship between CCP characteristics and workload, remain unclear. The purpose of this study was to evaluate the association between pharmacist board certification, CCP workload characteristics, and patient outcomes. METHODS: This was a pre-planned analysis of the multicenter, observational Optimizing Pharmacist Team Integration for ICU Patient Management (OPTIM) study, including adult ICU patients cared for by CCPs. Patients cared for exclusively by board certified pharmacists on every ICU day were categorized as the BCP group; those with at least one day of care from a non board certified pharmacist comprised the non BCP group. The primary outcome was hospital mortality; secondary outcomes included the hazard of discharge alive (HDA) from the ICU and hospital. Multivariable logistic regression was used to evaluate the association between BCP and mortality; Fine-Gray competing risk models were used to assess the relationship between BCP and ICU and hospital HDA. RESULTS: A total of 201 pharmacists (184 BCPs; 17 non BCPs) from 63 institutions caring for 20,537 ICU patients were included. Care provided exclusively by a BCP (vs. >/= 1 day by a non-BCP) was associated with lower mortality (OR 0.80, 95% CI 0.69 to 0.92, p=0.002) and both a higher ICU HDA (HR 1.08, 95% CI 1.03 to 1.13, p<0.001) and hospital HDA (HR 1.19, 95% CI 1.13 to 1.26, p<0.001). CONCLUSION: Daily ICU care delivered by pharmacists with board certification was independently associated with reduced mortality and improved hazard of discharge alive from the ICU. Board-certified pharmacists may enhance the quality and/or efficiency of critical care pharmacy services. These findings support the role of board certification as a modifiable factor to improve patient outcomes and optimize workload in the ICU.

12
Increasing Efficiency, Persistent Burden: Longitudinal Analysis of EHR Use and After-Hours Work in Emergency Medicine Residency

Preiksaitis, C. M.; Hughes, J.; Iscoe, M.; Makutonin, M.; Rider, A.; Melnick, E.; Rose, C.

2026-05-21 medical education 10.64898/2026.05.19.26353524 medRxiv
Top 0.3%
2.7%
Show abstract

Objectives: Electronic Health Records (EHRs) impose a significant time burden on physicians, often requiring work to be completed outside of scheduled hours. While this burden is well-documented, how it evolves throughout emergency medicine (EM) residency remains poorly understood. This study aimed to quantify EHR usage patterns, analyze the composition of after-shift work, and characterize the development of EHR efficiency across EM training. Methods: We conducted a retrospective cohort study of EM residents (postgraduate year [PGY] 1-4) using 5.5 years of EHR audit log data (2020-2025) at a single academic institution. We analyzed EHR time per new patient encounter, stratified by postgraduate year, and categorized activities into domains such as documentation, chart review, and orders. EHR work was measured both during and after scheduled shifts. Results: The analysis included 144 unique residents and 167,010 new patient encounters across 15,386 shifts. Encounter-attributed EHR time per encounter decreased by 52% from PGY-1 to PGY-4 (median 19.9 to 9.6 minutes, p<0.001), despite an 86% increase in patient volume per shift (median 7 to 13 encounters). This efficiency gain was driven primarily by a 69% reduction in documentation time (9.3 to 2.9 minutes), accompanied by shorter notes. After-shift work (EHR activity after the 9-hour clinical shift) was present in 89.9-94.4% of encounters. At the shift level, combined after-shift EHR time (encounter-attributed plus tracking board) was a median of 64.2 minutes per shift for PGY-1 and 104.2 minutes for PGY-4. Shift-level tracking board activity dominated the after-shift burden and increased with training (median 40.2 to 79.0 minutes per shift from PGY-1 to PGY-4). Conclusions: EM residents achieve substantial gains in on-shift EHR efficiency, with the largest reductions observed in documentation time, accompanied by shorter notes and faster input speed. However, a persistent after-hours workload, dominated by administrative and patient flow tasks, suggests that (at least at this single institution) system-level factors--not just individual skill--may contribute to this pattern. Monitoring these objective EHR metrics may help programs identify struggling learners and evaluate the impact of interventions aimed at improving resident well-being and workflow efficiency.

13
Ambient AI Documentation in Mixed-Language Encounters: A Heuristic Evaluation of Spanish-English and Mandarin-English Conversations

Hu, D.; Flores, D.; Flores, L.; Chien, R.; Lam, K.; Chow, E.; Guo, Y.; Tam, S.; Perret, D.; Pandita, D.; Zheng, K.

2026-05-22 health informatics 10.64898/2026.05.19.26353603 medRxiv
Top 0.3%
2.6%
Show abstract

Ambient AI documentation systems rely on automatic speech recognition to transcribe patient-provider conversations before generating clinical notes. However, little empirical evidence exists on how these systems perform in mixed-language clinical encounters. We conducted a mixed-method heuristic evaluation of an ambient AI documentation tool using 24 reenacted primary care conversations involving Spanish-English and Mandarin-English code-switching. Quantitative analyses measured mixed error rate (MER) and code-switching detection. Overall MER was low, with a median of 4% and less variation in Spanish-English conversations, and 9% in Mandarin-English conversations, but with outliers reaching 67%. The system generally detected language switches reliably, although deletions occurred frequently in Mandarin-English transcripts at switch points. Qualitative analysis revealed transcription errors related to phonetic similarity, automatic language translation, clinical terminology recognition, and language-specific challenges. These findings highlight considerations for improving ambient AI clinical documentation systems to support multilingual providers in delivering care for linguistically diverse populations.

14
Clinical Safety of AI-Generated Antibiotic Prescribing Advice: Guideline Adherence and Misinformation Risk Among Large Language Models

Khan, M. M.; Anwar, M. N.

2026-05-15 public and global health 10.64898/2026.05.13.26352828 medRxiv
Top 0.3%
2.1%
Show abstract

Background: Large language models (LLMs) are increasingly used in telehealth, but their safety in antibiotic prescribing remains uncertain, particularly in the presence of patient misinformation. Methods: A cross-sectional analytical study evaluated 5,000 responses from five chatbot models using 1,000 primary-care vignettes of mild infections. Guideline adherence, overprescribing, misinformation effects, and safety behaviors were assessed. Inappropriate prescriptions were classified using the WHO AWaRe framework. Results: Overall, 76.2% of responses were guideline-concordant, while 6.6% showed unprompted overprescribing and 17.2% were influenced by misinformation. Some models were more vulnerable to misinformation than others. Although most responses correctly noted that antibiotics do not treat viral infections, fewer advised consulting a doctor, and warnings against self-medication were rare. Many inappropriate prescriptions involved broad-spectrum antibiotics. Conclusion: LLMs show potential in telehealth but remain prone to misinformation and inappropriate prescribing. Stronger guideline integration and clinical oversight are necessary to ensure safe use. Keywords: antimicrobial stewardship; large language models; telehealth; antibiotic prescribing; misinformation; clinical safety

15
Discordance Between Perceived Health Information Competence and Cancer Prevention Knowledge in U.S. Adults: A Cross-Sectional Study

Lee, C. W.; Wong, A.; Yin, L.; Choi, Y.

2026-06-01 public and global health 10.64898/2026.05.28.26354370 medRxiv
Top 0.3%
2.1%
Show abstract

Background: Self-reported confidence in health information seeking does not reliably predict accurate health knowledge, yet the population-level distribution of this discordance and its demographic predictors have received limited direct study. This study aimed to identify and characterize a Confident-Incorrect phenotype among U.S. adults: individuals with high perceived health information competence who simultaneously hold inaccurate or fatalistic beliefs about cancer. Methods: Cross-sectional analysis of HINTS 7 (N = 7,278). A Confidence Index (3-item digital literacy composite (Cronbach's = 0.674) and an Evidence-Consistent Knowledge Score (factual cancer knowledge minus a cancer fatalism composite; fatalism subscale = 0.563) were computed and combined into a discordance framework. Median-split classification produced four phenotypes. Gaussian Mixture Model clustering with four components provided moderate independent validation (inter-method agreement = 65.2%). Survey-weighted multinomial logistic regression (n = 5,771; McFadden pseudo-R2 = 0.129) examined phenotype predictors. Results: An estimated 20.3% of U.S. adults were classified as Confident-Incorrect. They reported confidence levels similar to Well-Informed adults (z = 0.72 vs. 0.82) but scored 2.8-fold lower on objective cancer knowledge (0.74 vs. 2.06 out of 4) and exhibited the highest cancer fatalism of any phenotype (3.17 vs. 1.65 out of 4). Only 14.3% correctly identified alcohol as a cancer risk factor (vs. 58.8% of Well-Informed adults). Cancer screening rates did not differ meaningfully across phenotypes. Lower education (OR = 0.754), Hispanic ethnicity (OR = 1.788), non-Hispanic Black race (OR = 1.893), higher social media use (OR = 1.097), and lower trust in scientists (OR = 0.749) independently predicted Confident-Incorrect membership. Conclusions: An estimated one in five U.S. adults is overconfident in health information competence while holding substantially inaccurate beliefs about cancer prevention. Cancer screening rates did not follow the expected gradient across phenotypes, a null finding that cautions against inferring immediate behavioral impact from observed belief gaps. Interventions targeting specific factual errors and cancer fatalism are more likely to reach this group than general health literacy programs.

16
When Algorithms Prescribe: A Cross-Sectional Study of Quality, Misinformation, and Engagement in Statin-Related Content on TikTok

Gharibyan, I.; Ahner, E.; Shao, R.; Sharma, D.; Navarsartian Tazehkand, T.; Diep, J.; Assoumou, B.

2026-06-08 health informatics 10.64898/2026.06.04.26354962 medRxiv
Top 0.4%
2.0%
Show abstract

Background: Statins are key to preventing atherosclerotic cardiovascular disease and lowering low-density lipoprotein cholesterol and cardiovascular events. However, skepticism regarding their safety and value persists and is increasingly influenced by social media. TikTok has emerged as a major source of health information, but its content varies in quality and accuracy. This study evaluated the quality, attitudes, misinformation, and engagement of statin-related content on TikTok. Methods: Public TikTok videos were collected using predefined search terms and coded by creator type, thematic content, and overall attitude. Video quality was assessed using the DISCERN instrument, the Patient Education Materials Assessment Tool for Audiovisual Materials, and the Global Quality Score. False or misleading claims were independently reviewed by two cardiology fellows. Associations between engagement and quality were also examined. Results: Of 1,349 screened videos, 258 met inclusion criteria. Most were educational (91.0%), with non-physician healthcare providers (34.5%) as the largest creator group. Risks or negative effects were discussed more often than benefits (63.2% vs 42.2%), and 39.5% contained at least one false or misleading claim, most often from complementary and alternative medicine providers and wellness promoters. Quality differed by creator type across all instruments, with physician-created content scoring highest. Video popularity showed minimal association with informational quality. Conclusion: Statin-related TikTok content frequently emphasizes harms, often contains misinformation, and varies substantially in quality by creator type. Greater involvement of healthcare professionals on social media may help improve digital health literacy and counter misleading information about statin therapy.

17
Improving bystander automated external defibrillation application in Singapore: An 11-year population-based living-laboratory study

Bokman, J. T.; Singapore PAROS Investigators, ; Ee, S.; Fook-Chong, S. M. C.; Binte Ahmad, N. S.; Leong, B. S.; Chia, M. Y. C.; Okada, Y.; Ong, M. E. H.; Siddiqui, F. J.

2026-05-22 emergency medicine 10.64898/2026.05.20.26353744 medRxiv
Top 0.4%
2.0%
Show abstract

Background Bystander automated external defibrillator (BAED) use improves out-of-hospital cardiac arrest (OHCA) outcomes but remains uncommon globally. This study evaluated the outcomes of Singapore's 11-year public-access AED expansion and volunteer-responder implementation in terms of trends in BAED use, associated factors, and clinical outcomes. Methods This population-based, retrospective cohort study used Singapore Pan-Asian Resuscitation Outcomes Study (SG-PAROS) data (2010-2020) for adult, non-traumatic OHCAs. The primary outcome was bystander AED application. Multivariable logistic regression identified factors associated with use. Secondary outcomes included favorable neurological status (CPC 1-2), survival to discharge, and prehospital return of spontaneous circulation (ROSC). Results Of 21,439 included OHCA cases (median age 70.0 years; 63.8% male), BAED use increased from 1.7% to 9.6% over 11 years, with a corresponding increase in overall survival from 2.4 to 4.0%. Malay ethnicity (aOR 1.25, 1.06-1.49), calendar year (aOR 1.26, 1.22-1.29), and delayed emergency medical services (aOR 1.24, 1.06-1.45) were positive predictors of BAED use. Conversely, BAED use was lower among females (aOR 0.80, 95% CI 0.69-0.94), at night (aOR 0.69, 0.56-0.86), and in residential settings (aOR 0.06, 0.05-0.07). Volunteer arrival strongly increased application (aOR 4.16, 3.41-5.09), with a significant interaction (p<0.001); the effect was greater in residential (aOR 7.38, 5.81-9.38) than non-residential settings (aOR 1.71, 1.22-2.40). AED use predicted favorable neurological outcome (aOR 2.80, 2.24-3.50; NNT 8.7), survival (aOR 2.30, 1.89-2.80), and ROSC (aOR 2.11, 1.81-2.46). Conclusion Over 11 years, we saw a significant increase in BAED application and favorable neurological survival. This success was associated with the implementation of an integrated strategy combining widespread AED deployment, national training, and smartphone-activated volunteer responders. Singapore's experience provides a scalable model for urban centers seeking to expand their AED strategy.

18
Cross-Model Variability in Large Language Model Triage Behavior for Potential Stroke Symptoms

Dworkis, D. A.; Stenstrom, J.; Sen, A.; Lucarelli, R. T.

2026-05-25 emergency medicine 10.64898/2026.05.22.26353904 medRxiv
Top 0.4%
1.9%
Show abstract

Background: Stroke is a time-sensitive neurological emergency in which early EMS activation and presentation to definitive care are cornerstones of effective therapy. Large language models (LLMs) are increasingly consulted by the public for medical advice, but the veracity of the guidance provided by commercially available models responding to potential stroke symptoms is not well understood. Methods: We performed a cross-model benchmarking study comparing the triage choices of three frontier LLMs (Claude Sonnet 4.6, GPT-4o, and Llama 3.3-70b-versatile) on first-person vignettes describing a unilateral arm symptom on waking, across 10 symptom descriptors, and two clinical phases (before and after a partially reassuring self-examination), with or without a clinical distractor (n=50 per condition). Results: Claude sought emergency care most often, Llama least, and GPT-4o in between, diverging most sharply in the post-examination phase where Claude called 911 in 100% of runs, Llama called for non-emergency help in 100%, and GPT-4o was symptom-dependent. A distractor shifted behavior away from emergency care in almost all conditions: calling 911 fell from 37.9% to 14.6% and waiting rose from 0% to 45.9% in the post-examination vignette. Responses were also sensitive to symptom word: weak, limp, heavy, and clumsy generated higher alarm, whereas numb, tingly, odd, strange, and weird generated less urgent responses. Conclusions: The increasing use of LLMs for medical advice has significant public health implications. Commercially available LLMs show significant model-to-model variability and framing sensitivity when confronted with potential stroke symptoms, including under-recognition of canonical CDC warning descriptors, underscoring the need for systematic benchmarking as these tools become de facto first points of contact for patients experiencing neurological emergencies.

19
Post-ED Trajectory Prediction in Abdominal Pain with a Generative Medical Event Model

McCann, K. A.; Wright, D. S.; Iscoe, M. S.; Melnick, E. R.; Ohno-Machado, L.; Meeker, D.; Venkatesh, A. K.; Sangal, R. B.; Loza, A. J.

2026-05-21 emergency medicine 10.64898/2026.05.18.26353199 medRxiv
Top 0.4%
1.9%
Show abstract

Importance: Abdominal pain causes roughly 10 million US emergency department (ED) visits annually, most resulting in discharge. Post-discharge courses vary, yet existing risk models predict only whether an ED revisit occurs, not what that revisit outcome will entail. Objective: To evaluate whether Curiosity, a generative medical event foundation model, can predict post-ED-discharge trajectories for adults with abdominal pain, differentiating the timing and severity of expected outcomes. Design: Retrospective cohort study; encounters January 1-December 31, 2022; 30-day follow-up; analysis conducted in 2026. Setting: Epic Cosmos research network (multicenter, population-based, de-identified electronic health record). Participants: Adults ([&ge;]18 years) discharged from the ED with abdominal pain, excluding training-set patients. Random sample of 3,000 drawn from 150,030 eligible patients (65.3% female; median age 47 years [IQR 36-60]). Exposure: ED discharge after evaluation for abdominal pain. Main Outcomes and Measures: Primary: Curiosity model vs. per-task, separately estimated XGBoost models on area under the receiver operating characteristic curve (AUROC) for ED revisit ending in admission (admit-revisit), ED revisit ending in discharge (DC-revisit), and any ED revisit at 72 hours, 7 days, and 30 days. Secondary: trajectory-level accuracy across 36 trajectory classes and edit distance vs XGBoost; calibration of simulated vs observed conditional path probabilities across 45 transitions. Results: Curiosity identified patients at high risk of revisit requiring admission more accurately than XGBoost and differentiated those likely to revisit without admission. Among 3,000 patients, Curiosity's 30-day admit-revisit AUROC was 0.83 (95% CI 0.79-0.87) vs 0.70 (95% CI 0.65-0.75) for XGBoost (DeLong P<.001), and admit-revisit AUC-PR was 0.37 (95% CI 0.29-0.46) against a 4.1% cohort base rate, vs XGBoost 0.13 (95% CI 0.09-0.19). Curiosity identified the most likely trajectory out of 36 possibilities for 45.9% of patients (XGBoost 41.0%; McNemar P<.001), with median edit distance 1.28 vs 1.40 (Wilcoxon P<.001). Median absolute calibration error across 45 transitions was 1.30 percentage points (95% CI 0.32-2.49). Conclusions and Relevance: A generative medical event foundation model produced calibrated trajectory-level predictions and discriminated admit-revisits more effectively than task-specific XGBoost baselines, separating patients that revisited and were admitted from those who revisited and were discharged.

20
Closing the gaps: Improving physical health diagnosis in the emergency department for patients with mental health conditions

Jayaprakash, A.; Liberati, E.; Lindsay, R.; Willars, J.; Gibson, J.; Fritz, Z.; Price, A.; Hatfield, T.; Richards, N.; Martin, G.

2026-06-08 emergency medicine 10.64898/2026.06.05.26354970 medRxiv
Top 0.4%
1.8%
Show abstract

Objectives People with mental health conditions experience increased rates of diagnostic errors and delays in acute treatment. While causes such as diagnostic overshadowing (misattribution of physical symptoms to mental health conditions) are well documented, less attention has been paid to the organisational and structural conditions that shape diagnostic work. This study examines how physical illness is diagnosed in patients with mental health conditions in emergency departments (EDs), with a focus on the structural conditions that enable or constrain safe diagnostic practice. Method We conducted a multi-site ethnography across three purposively selected EDs in England between April 2023 and April 2024, varying in size, population demographics, and local service configuration. Data were collected through 284 hours of non-participant observation and 20 semi-structured interviews with ED staff. Results Our analysis identified four recurring structural gaps that shaped the conditions under which physical health diagnosis took place for patients with mental health conditions: a design gap, whereby targets and physical layouts constrained diagnostic reasoning; a preparedness gap, reflecting the lack of structural support to allow staff to act on their existing knowledge and skills; a coordination gap, reflecting fragmented ownership and the challenges of joint assessment across mental and physical healthcare teams; and an expectation gap, whereby unmet need elsewhere in the system increased demand for ED services that were beyond its formal scope. These gaps made diagnostic errors and delay more likely for patients with mental health conditions seeking physical healthcare in the ED. Conclusions As new dedicated mental health EDs are introduced in England, there is an opportunity to avoid reproducing these structural gaps in new settings. Our study suggests that improving physical healthcare for patients with mental health conditions requires changes to how EDs are designed, resourced and supported, and how they connect with the wider health and care system. Keywords: mental health, diagnostic inequality, emergency departments